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ABSTRACT 

The relationship of sample size to number of 
variables in the use of factor analysis has been treated by many 
investigators. In attempting to explore what the minimum sample size 
should be^ none of these investigators pointed cut the constraints 
imposed on the dimensionality of the variables by using a sample size 
smaller than the number of variables, A review of studies in this 
area is made as well ais suggestions for resolution of the problem, 
(Author) 
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Abstract 



The relationship of sample size to nuri>er of variables in the use of 
factor analysis has been treated by many investigators. In attempting to 
explore what the minimum sample size should be, none of these investigators 
pointed out the constraints imposed on the dimensionality of the variables 
by using a sample size smaller than the number of variables. A review of 
studies in this area is made aa well as suggestions for resolution of the 
problem. 



THE RELATION OF SAMPLE SIZE TO THE NUMBER OF 

VARIABLES IN USING FACTOR ANALYSIS TECHNIQUES 

Lawrence M, Aleamonl^ 

One of the baoic axioms of factor analysis Is that the number of 

variables (V) In the correlation matrix should not exceed the number of 

observations (N) . In fact, this axiom is so taken for granted by Thuratone 

(1947), Guilford (1954), and Harman (1960) that they state It without any 

supporting reasons. 

Of course, the most obvious reason for having N greater than V is that 

othetwise one restricts the maximum number of linearly Independent factors 

that can be extracted from the correlation matrix. This can readily be shown 

by Che following: 

The rank of a matrix is the maximum nurriber of linearly independent 
row (or oolwm) vectora of that matrix (Murdoch^ 19S7). The rank 
of the product of tix) matrices is less than or equal to the rank of 
either matrix. Whenever the number of rows is not equal to the 
nurriber of columns of a matrix the maximum number of linearly inde^ 
pendent vectors is equal to the smaller number of roios or colxffms 
of that matrix. Thus, the maximum nutrber of linearly independent 
factors that can be extracted from a correlation matrix R is equal 
to the rank of either matrix that is used to generate R. 

Now the correlation matrix R equals FF' and by a well-laiown theorem, 
the rank of R is less than or equal to the rank of F' or F, whichever Ic 
smaller. But F and F' are mutual transposes and so their rank£3 are equal 
and, therefore, the rank of F equals the rank of R ( f or exatnple, see Tatsuoka, 
1971, pp. 133-134). 

Furthermore, NR « ZZ\ where 2 is the standard score matrix, and since we 
are concerned only with the ranks of the matrices, the non-zero factor N Is 
irrelevant and the rank of R equals the rank of 2 (for example, see Harman, 
1960, pp. 62-63). 




^The author wishes to gratefully acknowledge the contribution of Mr. Nick L. 
Smith Ir. the generation of this paper. 



Finally, since the rank of a matrix is never greater than its smaller 
dimension (for exaniple, see Horst, 1963, p. 334), the rank of Z is less than 
or equal to the smaller of N or V, Therefore, the maximum number of linearly 
independent factors that can be extracted from the correlation matrix R is 
less than or equal to the smaller dimension, N or V, of the original standard 
score matrix* 

Humphreys (1964) in discussing Kaiser's rule of thumb for extracting only 
as many common factors equal to the number of roots greater than one of the 
complete correlation matrix (i.e., with ones in the main diagonal), suggests 
that with a small N, even this criterion might result in retaining a factor 
which is dependent only on chance. By using a small N, one may be capitalizing 
on sampling error in interpreting factors. 

Aleamoni (196'V) factor analyzed 66 observations on 62 variables using the 
Principal Axes procedure with Varimax rotation. Then, using a table of random 
numbers, he selected three subsamples of N » 51, N « 33, and N » 17, 
respectively, and attempted to factor analyze them. The subsample of 17 
observations could not be factored, however, since the communalities were 
greater than one and were not acceptable. He attributes this to either the 
small N of 17 or else computer error. 

The factor analysis of the subsanples of sizes 51 and 33 did produce 
interesting resultc;, though. The subsample of N « 51 gave several factors 
quite similar to those of the total sample of N = 66. The subsample of 
N • 33, however, gave only two factors thar were sinllar to those of the 
original sample. Aleamoni concluded that as N becomes less than or equal to 
V, the resultant inl:ercorrelation matricies becoirie less similar than those 
where N is larger than V. 



Humphreys, et al., (1969) state that, as yet, no minimum N can 
specified, but. that :l should be as large as feasible so that factors aiv oased 
on stable differences among correlations as well as on correlations that are 
significantly greater than zero. They recommend including the smallest 
number of variables which will still serve the purposes of the investigation, 
and limiting the number of factors extracted to one-quarter of the number of 
variables. They further suggest that a trade-off between the number of 
observations, number of variables, and number of factors is a reasonable 
procedure. If, for example, only a limited number of observations is possible 
then the number of variables studied and factors extracted should also be 
restricted. 

Even when N is relatively large, however, extracted factors can still be 

due to chance. Using N's of 48, 96, and 384, and V's of 12, 24, and 48, 

Humphreys, et al., (1969'» have been able to construct apparently well-defined 

factors from intercorrelations of random normal deviates. They state: 

Empincally ^here ave a great many factor^analytic investigations 
reported in which many of the variables have distributions of 
correlations that do not differ markedly from random distributions, 
(p. 268) 

This does not mean, of course, that all such factors are necessarily 
random, but only that better data are needed before one can confidently 
conclude that they are nonrandom. Of course, large sample sizes reduce the 
probability that factors are attributable entirely to chance. 

Solomon (1966) followed this general approach of getting additional data 
confirm the existance of factors in his study of teacher behavior dimensions. 
Solomon (Solomon, Bezdek, and Rosenberg, 1964; and Solomon, Rosenberg, and 
Bezdek, 1964) first of all factor analyzed 24 observations on 169 variables 
and extracted eight factors. He stated that he realized he had violated the 
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N greater than V requirement, but that this study was exploratory and lAat the 
nuniber of variables was kept large "...so that the possibilitv of obtaining 
n«w and/or more subtle dimensions than have emerged previously would be 
maximized" (Solomon, Bezdek, and Rosenberg, 1964, p. 32). As has already been 
pointed out, however, with N less than V, it is the size of N that determines 
the number of possible dimensions: and not V, 

Solomon (1966), however, does go on to attempt to show that his factors 
are nonrandom. He factor analyzed 69 variables (items which were selected 
according to their loadings on the questionable factor analysis of the previous 
study) with a sample of 229 observations. Solomon reports that seven of the 
eight previous factors appeared again, in addition to four new ones. 

This use of replication to show that results are nonrandom is, of course, 
a common and accepted practice. Cohen and Guthrie (1966), for example, in 
studying motivation patterns of college attendance, factor analyzed 105 
variables on two samples of 105 and 95 observations, respectively. Although 
both sample sizes were probably much too small, they attempted to U3e the 
results of the smaller sample to confirm the results of the larger. They 
report that only six of the ten factors described in the first analysis were 
confirmed by the second analysis. 

Although it appears on the surface that replications may heip somewhat 

in confirming that factors defined from analyses with small N are nonrandom, 

Humphreys, et al., (1969) strongly caution: 

Replicability y which is the mains tcoj of the scientific method^ is 
hopeless in factor analysis studies unless hedged about with more 
controls than is condrcnly the case. It is clear that with app\x>priate 
values of Nj n [number of variables]^ and m [number of factors extracted^ 
the Procrustes method^ either oblique or orthogonal^ could replicate 
random factors endlessly, (p. 269) 



Thus, investigators seem to know that sample sizes should be larger 
than the number of variables before legitimately doing a factor analysis, but 
it appears from these studies that they frequently do not understand why this 
is the case and so tend to violate the constraint. Nor aces replication 
seem to be an entirely satisfactory way to compensate for a sraali sample 
size , 

The only recourse seems to be for investigators more strictly to adhere 
to the restriction of not using factor analysis mless the sample size is 
considerably larger than the number of variables. Even though previous 
investigators have stated that no minimum N can be specified, if one is 
interested in maximizing the number of possible dimensions underlying V, 
then N must be at least greater than V. 
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